Introduction

Idea

The idea of the analysis of the datasets smashy_lcbench and smashy_super is to understand the dependencies between the hyperparameters and the target variable yval, using the implemented plots from the VisHyp package and, most importantly, without the help of any automatic optimization. We want to understand which parameter is important, i.e. has a large impact on the result, which parameter needs to be set more precisely and for which parameter the value is almost irrelevant. Furthermore, we want to understand the dependencies between the parameters themselves. Finally, we want to compare the results of the two datasets.

For each dataset, we want to examine the entire dataset and the best 20% of the yval values to get a more detailed insight into the configurations of the best results. We will partition our data with the bounded range per parameter to obtain a subset of configurations with good yval values. We will also look at this constrained parameter range using PCPs.

We will use Importance Plots, Partial Dependence Plots (PDP), Heatmaps, and Parallel Coordinate Plots (PCP) to analyze the data. Importance plots provide the most important parameters. For a quick overview, we will use heatmaps. For a deeper insight into the boundary structure as well as for dependencies between 2 parameters we will then use Partial Dependence Plots (PDP). Only when the dataset has been reduced in size, we can also use Parallel Coordiante Plots (PCP) to get a good impression about parameter configurations. In addition, we will look at the data using Summaries to draw further conclusions.

Structure and Outline

This analysis is structured as follows, first the treated dataset is prepared, so that one can use it for analyses. Then, the analysis is performed and the results are used to suggest good configuration ranges for each parameter. The analyses and deeper insights into the analyses of each parameter, can be selected in the Table of Contents (TOC) on the left. Prior to this chapter, an overview of the dataset is provided. Finally, the results of the two datasets are compared.

Dataset: smashy_lcbench

Data Preparation

We need to load packages and subdivide the data to compare the whole dataset and the dataset with the 20% of configurations with the best result. In addition, the data must be manipulated to facilitate the use of the data for summaries and filters.

Load Data

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

All plots from the VisHyp package require an mlr3 task object as input. Therefore, a mlr3 task with the selected target is required.For lcbench, the target is yval, a logloss performance measure. Values near 0 indicate good performance.

Create Task

lcbenchTask <- TaskRegr$new(id = "task_lcbench", backend = lcbenchSmashy, target = "yval")

lcbenchBest <- lcbenchSmashy[lcbenchSmashy$yval >= quantile(lcbenchSmashy$yval, 0.8),]
bestTask <- TaskRegr$new(id = "bestTask", backend = lcbenchBest, target = "yval")

Results

The target parameter yval can reach values between -0.9647 and -0.4690. Our goal is to obtain good results, i.e., to find configurations that produce values close to -0.4690.

The most important parameter is sample. It should always be chosen “bohb” and not “random”, because 2130 of the best 2143 configurations were created with this factor and the average effect on yval is much larger when “bohb” is chosen.

The next very important parameter is the survival_rate. It can be seen that a low value is better on average, but high values can also be good for the best configurations. A value between 0.15 and 0.5 should be chosen for a high average performance without any further limitation. If a surrogate_Learner is selected, the constraint of the parameter should be chosen according to the selected surrogate_Learner.

Even though the surrogate_learner parameter is not that important, it influences most other parameters. This means that other parameter values should be set depending on the selected surrogate_learner if they have different effects on the performance measure. An indication that the surrogate_learner parameter has a large impact on the other parameters was given by the Importance Plot for the partial datasets split by surrogate_learner. This assigned different importance to the individual parameters, depending on the subset selected. This is especially noticeable for “bohb” samples. Parameters that should be selected depending on the chosen surrogate_learner are listed below. However, there are also findings of which surrogate_learner gives the best results: In the full dataset, surrogate_learner knn1 or knn7 showed the best performance and ranger the worst. For the top cases, we saw that many bohblrn and rangers were filtered out in disproportionate numbers. Surprisingly, bohblrn turned out to be the level of greatest importance.

knn1: survival_fraction should get a value above 0.5 if we are interested in the best cases. For the whole dataset, the best cases were on average below 0.5 random_interleave_fraction should be low and have a value between 0.05 and 0.5 according to the complete dataset. budget_log_step should be chosen between -0.5 and 0.5. filter_factor_first should get a value under 4. filter_select_per_tournament should get a value over 0.9.

knn7: filter_factor_first should be under 4. survival_fraction should be between 0.1 and 1 according to both, the full dataset and the subset. budget_log_step produces good performances for values between -0.5 and 1 but has not a big impact in general. random_interleave_fractionshould be between 0.25 and 0.75 according to the full dataset. In the subset it doesn’t matter. random_interleave_random should be “FALSE”. filter_select_per_tournament should be over 0.5.

bohblrn: random_interleave_fraction better if lower. A good valuer should be between 0.05 and 0.65. survival_fraction lower is better in the full dataset but it doesnt matter for the best configurations budget_log_step it is hard to tell because of fluctuation but should be at least over -1.5. filter_algorithm should be “progressive”. filter_factor_last should be over 5. filter_factor_first should not be restricted.

ranger: random_interleave_fraction should be over 0.25. survival_fraction should be under 0.75. budget_log_step should be over -1.5.

Another important parameter for the general case is the random_interleave_fraction parameter. We have found that in general low values under 0.3 are better for “random” samples, and values between 0.1 and 0.75 are better for “bohb” samples. But this is only the case because it depends on surrogate_learner, and diser has many observations for levels knn1 and knn7. For these levels, a low value must be chosen to get a good result. For the “bohb” sample, values in the middle are better and for “ranger” high values achieve the best yval values. For the top cases, the parameter lost importance. This could be because the counter case with “random” samples are almost completely filtered out. The level factor did not change the behavior for the top case (for bobhlrn, the middle range is not so important anymore).

The second most important parameter for “bohb” sampling is the budget_log_step parameter. For the full dataset this parameter should be set between -0.5 and 1, but when choosing a surrogate_learner the parameter should be set according to this parameter.

filter_with_max_budget is not an important in general but should always be set to “TRUE” and is more important for “bohb” samples. Anyway, the effect is important for the surrgoate_learner “bohblrn” in top cases.

filter_factor_first is the most important parameter for the top 20%. It also has a higher importance in “random” samples than in “bohb” samples. In general it should be low (under 4) for “bohb” samples and high (near to 6) for “random” samples. The parameter filter_factor_first should not be restricted if the surrogate_learner is “bohblrn.”

filter_factor_last The effect is low and shouldn’t be used to subdivide the dataset in general.

filter_select_per_tournament shouldn’t be too high in general case but doesnt really matter for good results.

filter_algorithm and random_interleave_random have hardly any effect and can be left out for deeper investigations. Only for surrogate learner the factor “bohblrn” should be considered.

Data Constraint to Check the Results

To verify the proposed parameter configurations, we constrain the dataset and compare the obtained performance with the ranks of the performance of the whole dataset.

lcbenchEvaluation <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$surrogate_learner == "bohblrn",] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$random_interleave_fraction > 0.05 & lcbenchEvaluation$random_interleave_fraction < 0.65,] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$budget_log_step > -1.5,]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_with_max_budget == "TRUE",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_algorithm == "progressive",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_factor_last > 5,]

lcbenchYval <- sort(lcbenchEvaluation$yval, decreasing = TRUE)
lcbenchYvalOriginal <- sort(lcbenchSmashy$yval, decreasing = TRUE)
sort(match(lcbenchYval, lcbenchYvalOriginal), decreasing = FALSE)
##  [1]    5    6   11   12   13   16   17   21   23   25   26   28   29   30   35
## [16]   37   44   47   53   55   57   64   75   82   84  112  117  129  139  178
## [31]  182  214  247 1153 2896 3181 3944 3961 5161 5997 6095 6635 6953 7318 7450
## [46] 7707 7930 8208 8212

We can see that many good results were obtained, but not nearly all of the best configurations were found out. This can be explained by the fact that we often imposed constraints to reduce the size of the dataset. For example, for some categorical parameters, we always chose one factor even though we knew that other categories could also yield good values. Furthermore, numerical parameters were partly restricted, although it was known that for some very good configurations, very good yval values can also be obtained outside the range.

Most interestingly, we get many good results, but also some seemingly bad ones. This could be due to hidden interactions that were not found, or inaccuracies in the constraints placed on the parameters by the visualization plots. In the second possibility, the poorer performance values could be due to errors in the interpretation of the plots. But also difficulties with the surrogate model could be decisive if predicted values of the performance values are not determined correctly. In addition, an inappropriate grid size in a PCP can lead to inaccuracies.

Finally some metrics are used to verify the results. The importance of the metrics can be found in the bachelor thesis.

summary(lcbenchEvaluation$yval)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.6003 -0.5262 -0.4798 -0.5026 -0.4748 -0.4713
#proportion
length(lcbenchEvaluation$yval)/length(lcbenchSmashy$yval)
## [1] 0.004574309
#top congifuration
sum(lcbenchYval >= quantile(lcbenchSmashy$yval, 0.95))/length(lcbenchYval)
## [1] 0.6734694
#quantile
sum(lcbenchSmashy$yval<=max(lcbenchYval))/length(lcbenchSmashy$yval)
## [1] 0.9996266

Visual Overview

With the implemented PCP our Results can be visually checked.

Limitation to Good Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Best_PCP.png")

Limitation to Bad Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Bad_PCP.png")

Overview

For visual analysis it is important to know the configuration spaces and the class of parameters.

Structure

str(lcbenchSmashy)
## 'data.frame':    10712 obs. of  12 variables:
##  $ budget_log_step             : num  0.1145 -0.4292 0.0482 0.8538 -1.4559 ...
##  $ survival_fraction           : num  0.261 0.3376 0.0149 0.7322 0.8552 ...
##  $ surrogate_learner           : Factor w/ 4 levels "bohblrn","knn1",..: 3 3 3 1 3 2 1 2 3 3 ...
##  $ filter_with_max_budget      : Factor w/ 2 levels "FALSE","TRUE": 1 2 2 2 2 1 1 1 2 1 ...
##  $ filter_factor_first         : num  0.234 3.756 1.002 0.437 0.672 ...
##  $ random_interleave_fraction  : num  0.225 0.104 0.542 0.489 0.516 ...
##  $ random_interleave_random    : Factor w/ 2 levels "FALSE","TRUE": 2 2 1 1 1 1 1 1 2 1 ...
##  $ sample                      : Factor w/ 2 levels "bohb","random": 1 2 2 1 1 2 1 2 2 2 ...
##  $ filter_factor_last          : num  0.387 1.589 2.927 5.775 6.422 ...
##  $ filter_algorithm            : Factor w/ 2 levels "progressive",..: 1 1 1 1 2 1 1 1 2 2 ...
##  $ filter_select_per_tournament: num  2.27 2.3 1.93 1.52 0.51 ...
##  $ yval                        : num  -0.499 -0.535 -0.54 -0.475 -0.506 ...

We want to look at the importance for the whole dataset (general case) and for the best configurations (top 20%).

Importance General

plotImportance(lcbenchTask)

Importance Best

plotImportance(bestTask)

For the general case, sample is the most important hyperparameter. The random_interleave_random parameter is of little importance. For the best configurations, filter_factor_first and filter_factor_last are the most important parameters and the sample parameter is no longer of importance. The ranking of the parameters has changed a lot, but the value of the importance measure has hardly changed for the parameters except for the sample parameter. We also look at a PCP:

plotParallelCoordinate(lcbenchTask)

It can be seen that there are too many observations to see much. The PCP makes more sense with fewer observations. After dividing the data, we first look for structural changes.

Summary All

summary(lcbenchSmashy)
##  budget_log_step   survival_fraction   surrogate_learner filter_with_max_budget
##  Min.   :-1.7528   Min.   :0.0000686   bohblrn:1372      FALSE:4801            
##  1st Qu.:-1.0795   1st Qu.:0.1877029   knn1   :3111      TRUE :5911            
##  Median :-0.4192   Median :0.3602689   knn7   :4803                            
##  Mean   :-0.3839   Mean   :0.4179906   ranger :1426                            
##  3rd Qu.: 0.3110   3rd Qu.:0.6339252                                           
##  Max.   : 1.0196   Max.   :0.9998031                                           
##  filter_factor_first random_interleave_fraction random_interleave_random
##  Min.   :0.000763    Min.   :0.0000227          FALSE:5008              
##  1st Qu.:2.791122    1st Qu.:0.1496729          TRUE :5704              
##  Median :4.452371    Median :0.3419693                                  
##  Mean   :4.139002    Mean   :0.3893602                                  
##  3rd Qu.:5.690380    3rd Qu.:0.6082803                                  
##  Max.   :6.907525    Max.   :0.9999744                                  
##     sample     filter_factor_last    filter_algorithm
##  bohb  :8763   Min.   :0.000763   progressive:3882   
##  random:1949   1st Qu.:2.462215   tournament :6830   
##                Median :4.267029                      
##                Mean   :3.960315                      
##                3rd Qu.:5.569787                      
##                Max.   :6.907578                      
##  filter_select_per_tournament      yval        
##  Min.   :0.001612             Min.   :-0.9647  
##  1st Qu.:1.000000             1st Qu.:-0.5923  
##  Median :1.000000             Median :-0.5377  
##  Mean   :1.086512             Mean   :-0.5646  
##  3rd Qu.:1.228722             3rd Qu.:-0.5189  
##  Max.   :2.397413             Max.   :-0.4690

Summary Best 20%

summary(lcbenchBest)
##  budget_log_step   survival_fraction  surrogate_learner filter_with_max_budget
##  Min.   :-1.7503   Min.   :0.000095   bohblrn: 130      FALSE: 731            
##  1st Qu.:-1.0406   1st Qu.:0.170492   knn1   : 796      TRUE :1412            
##  Median :-0.3780   Median :0.332510   knn7   :1161                            
##  Mean   :-0.3321   Mean   :0.381662   ranger :  56                            
##  3rd Qu.: 0.3890   3rd Qu.:0.523938                                           
##  Max.   : 1.0195   Max.   :0.999789                                           
##  filter_factor_first random_interleave_fraction random_interleave_random
##  Min.   :0.004248    Min.   :0.0000964          FALSE:1020              
##  1st Qu.:3.643269    1st Qu.:0.1208691          TRUE :1123              
##  Median :4.845318    Median :0.2392768                                  
##  Mean   :4.546724    Mean   :0.3170039                                  
##  3rd Qu.:5.870564    3rd Qu.:0.4727989                                  
##  Max.   :6.907525    Max.   :0.9979292                                  
##     sample     filter_factor_last    filter_algorithm
##  bohb  :2130   Min.   :0.004248   progressive: 798   
##  random:  13   1st Qu.:3.101750   tournament :1345   
##                Median :4.634717                      
##                Mean   :4.263191                      
##                3rd Qu.:5.721979                      
##                Max.   :6.907525                      
##  filter_select_per_tournament      yval        
##  Min.   :0.002426             Min.   :-0.5160  
##  1st Qu.:1.000000             1st Qu.:-0.5126  
##  Median :1.000000             Median :-0.5082  
##  Mean   :1.064477             Mean   :-0.5047  
##  3rd Qu.:1.101817             3rd Qu.:-0.4995  
##  Max.   :2.396205             Max.   :-0.4690

surrogate_learner: Many “bohblrn” and “rangers” were kicked out in disproportionate numbers. This could mean that these learner perform worse on average. filter_with_max_budget: In proportion more “FALSE” were filtered out. This could means that “TRUE” values perform better on average. We can see that only 13 rows of the the best 20% configurations have “random” sampling. The other (over 2100) instances have used “bohb” sampling. That is also the reason why the parameter sample has no importance for the subdivided dataframe since there are barely configurations samples with the factor “random” left.

The hyperparameter will be examined in following sections more precise.

Examination of the Parameters

sample

As we could notice, sample is the most important parameter in the full dataset. This parameter should have the right value to perform well. So let’s look at the effect of the variables in a PDP. We also check if the effect applies to all parameters. We can use a Heatmap to get a quick overview of the interactions. Values close to 1 have hardly any effect on the result.

PDP

plotPartialDependence(lcbenchTask, features = c("sample"), rug = FALSE, plotICE = FALSE)

Heatmap

subplot(
plotHeatmap(lcbenchTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)

PDP: It can be seen that the target values for “bohb” samples lead always to better results on average than for “random” samples.

Heatmaps: Note that survival_fraciton and random_interleave_fraction may give better results if a lower value is chosen for their parameter. Also, the surrogate_learner knn1 and knn7 seem to give better results. On average, the “bohb” sample is better, but let’s look at the best results and the combination of their instances.

We want to look at only the best configurations and verify that mostly “bohb” samples occur. Therefore we split the dataset into “bohb” and “random” samples.

random <- lcbenchSmashy[lcbenchSmashy$sample == "random",]
bohb <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",]

randomTask <- TaskRegr$new(id = "task_random", backend = random, target = "yval")
bohbTask <- TaskRegr$new(id = "task_bohb", backend = bohb, target = "yval")

We do split the entire dataset for the best configurations because we assume differences between “random” and “bohb” samples because many “random” were filtered out and the parameter lost a lot of importance. For these reasons, we split the dataset and focus primarily on the “bohb” sample in what follows. For the best 20% configurations we focus on “bohb” only.

Let’s check if there are differences in importance for the parameters in the “random” subset and the “bohb” subset.

Subset bohb

plotImportance(bohbTask)

Subset random

plotImportance(randomTask)

The hyperparameter survival_fraction is the most important parameter. Also random_interleave_fraction has high importance for both subsets. The parameters filter_algorithm and random_interleave_random do not seem to be important at all.

Bohb sample: The parameter budget_log_step is now more important. In the first plot, this parameter was not ranked that high. So we can assume that it is very important for this subset. The importance of the other parameters has not changed that much compared to the full data but the hyperparameter surrogate_learner and filter_with_max_budget are more important than for “random” samples.

Random sample: It looks like the right parameter configuration is more important in the “bohb” sample because The parameter importance values are in general higher than in the “bohb” sample. The parameters filter_factor_last and filter_factor_first have a higher importance in the “random” sample.

Top 20%

We could see in the beginning that most of the good results were gained with “bohb” samples. That’s why we will focus on “bohb” samples only from now on. That is, we remove the 13 rows of “random” samples from the underlying data.

bohbBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
bohbBestTask <- TaskRegr$new(id = "bohbBestTask", backend = bohbBest, target = "yval")

survival_fraction

The survival_fraction parameter is the most important parameter for both samples of the entire dataset. With a PDP, we can gain better insight into how the parameter should be configured.

Subset bohb

plotPartialDependence(bohbTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE) 

Subset random

plotPartialDependence(randomTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

In general, lower values achieve better performance than higher values. For the “bohb” susbet, the best range seems to be between 0.15 and 0.6. This means that too low a value is not so good in this case. For the “random” subset it is almost monotonically decreasing, which means that lower values are always better.

Top 20%

A possibility to find reasons for the structure is to filter the dataset again. For this we can split the data according to the best 20% yval values of the “bohb” samples

plotPartialDependence(bohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE, gridsize = 20)

In this case, higher values seem to be somewhat better. This is surprising, since in the general case low values were more important. It could mean that with good configurations of other parameters, the survival_fraction parameter even gives better results when a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75. Looking at the rug, we see that most configurations were made below 0.5 and the fewest configurations were made above 0.75. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the average curve. However, the difference on the y-axis is only small and therefore it cannot be said that high values are better.

surrogate_learner

Another important parameter for “bohb” subset is the surrogate_learner.

plotPartialDependence(bohbTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

In this graphic, knn1 and knn7 seem to be the best choices based on the results so far. For a more detailed analysis, we should divide the data into the individual surrogate_learners again and check if there are difference in the importance of the remaining parameters.

knn1 <- bohb[bohb$surrogate_learner == "knn1",] 
knn7 <- bohb[bohb$surrogate_learner == "knn7",] 
bohblrn <- bohb[bohb$surrogate_learner == "bohblrn",]
ranger <- bohb[bohb$surrogate_learner == "ranger",]

knn1Task <- TaskRegr$new(id = "knn1Task", backend = knn1, target = "yval")
knn7Task <- TaskRegr$new(id = "knn7Task", backend = knn7, target = "yval")
bohblrnTask <- TaskRegr$new(id = "bohblrnTask", backend = bohblrn, target = "yval")
rangerTask <- TaskRegr$new(id = "rangerTask", backend = ranger, target = "yval")

Subset: knn1

plotImportance(knn1Task)

Subset: knn7

plotImportance(knn7Task)

Subset: bohblrn

plotImportance(bohblrnTask)

Subset: ranger

plotImportance(rangerTask)

The parameter survival_fraction is very important for the “bohblrn” and “knn1” subset. This could already be seen in the PDP for survival_fraction. The hyperparameter random_interleave_fraction has high importance for all surrogate_learners. For the factor “knn7” the parameter budget_log_step seems to be more important than for other factors of the surrogate_learner parameter. To check why the importance differs and whether the parameters have different good ranges, let’s take a closer look at 3 very important parameters. We use ICE curves here to gain further insight. Later we check each factor separately for the top 20% of the configuration to find differences.

knn1: random_interleave_fraction

plotPartialDependence(knn1Task, "random_interleave_fraction", plotICE = FALSE)

knn7: random_interleave_fraction

plotPartialDependence(knn7Task, "random_interleave_fraction", plotICE = FALSE)

bohblrn: random_interleave_fraction

plotPartialDependence(bohblrnTask, "random_interleave_fraction", plotICE = FALSE)

ranger: random_interleave_fraction

plotPartialDependence(rangerTask, "random_interleave_fraction", plotICE = FALSE)

For “knn1”, lower random_interleave_fraction values seem to be better. For “knn7” and “bohblrn”, the random_interleave_fraction values should be neither too high nor too low, and for “ranger”, higher values lead to better yval results. A good range for “bohblrn” seems to be between 0.05 and 0.65. For knn1 a value between 0.05 and 0.5 seems good. A good range for “knn7” seems to be between 0.25 and 0.75

knn1: survival_fraction

plotPartialDependence(knn1Task, "survival_fraction", plotICE = FALSE)

knn7: survival_fraction

plotPartialDependence(knn7Task, "survival_fraction", plotICE = FALSE)

bohblrn: survival_fraction

plotPartialDependence(bohblrnTask, "survival_fraction", plotICE = FALSE)

ranger: survival_fraction

plotPartialDependence(rangerTask, "survival_fraction", plotICE = FALSE)

Low value for survival_fraction are better in general and could be set to under 0.5 but high values are worst for the “boblrn”. For the surrogate_learner “knn7” values around 0.5 seems to produce best performanes, for the factor “knn1” a good choice is between 0.1 and 0.6. For for all other factors values under 0.5 are better.

knn1: budget_log_step

plotPartialDependence(knn1Task, "budget_log_step", gridsize = 40, plotICE = FALSE)

knn7: budget_log_step

plotPartialDependence(knn7Task, "budget_log_step", gridsize = 40, plotICE = FALSE)

bohblrn: budget_log_step

plotPartialDependence(bohblrnTask, "budget_log_step", plotICE = FALSE)

ranger: budget_log_step

plotPartialDependence(rangerTask, "budget_log_step", plotICE = FALSE)

It is very interesting that the line for the parameter budget_log_step shows repeated dips. It is only for the factors “knn7” and “knn1”. The range is hard to identify since it also depends on the gridsize of the plot. It can be said that a value over -0.5 is a good choice knn7 and “ranger.” For “bohb” there are repeated dips but a value should be over -0.5. For “knn1” and “knn7” values bewteen -0.5 and 1 seems to achieve good results.

Top 20%

We also want to invest the best cases and for this directly check the subdivided datasets. For this we will search and analyze the most important parameters with the Importance Plot. In addition, we will examine abnormalities in the PCP in more detail and also look on some summaries.

plotPartialDependence(bestTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

the factor “bohblrn” of the surrogate_learner parameter is now most important, and the factor “ranger” is cleary more important now.

surrogate_learner bohblrn

Lets investigate the surprising outcome of surrogate_learner class “bohblrn”

bohblrnBest <- bohbBest[bohbBest$surrogate_learner == "bohblrn",]

bohblrnTaskBest <- TaskRegr$new(id = "bohblrnTask", backend = bohblrnBest, target = "yval")

PCP bohblrn

plotParallelCoordinate(bohblrnTaskBest, labelangle = 10)

Importance Plot bohblrn

plotImportance(bohblrnTaskBest)

PCP: A high value for the filter_factor_last parameter could be better since there a lot of lines + reach high yval values. The filter_with_max_budget parameter should be set to “TRUE” and the parameter filter_algorithm should be set to “progressive”. It looks like high budget_log_step achieve best results. The parameter filter_factor_first should be restricted.

Importance Plot: In the genereal case for bohblrn survival_fraction was most important (by far!), now it is budget_log_step and filter_with_max_budget.

Let’s investigate why the survival_fraction parameter lost in importance.

bohblrn: full Dataset

plotPartialDependence(bohblrnTaskBest, "survival_fraction")

bohblrn: subdivided Dataset

plotPartialDependence(bohblrnTask, "survival_fraction")

Before a high survival_fraction led to a drop, but one can see that it doesn’t effect very good results! Here we can see why as an addition to the PDP, ICE Curves can be useful as well.

Let us observe the other impotant parameter from PCP and Importance Plot for the “bohblrn” of surrogate_learner.

bohblrn: PDP budget_log_step

plotPartialDependence(bohblrnTaskBest, "budget_log_step", gridsize = 30, plotICE = FALSE)

bohblrn: PDP filter_with_max_budget

plotPartialDependence(bohblrnTaskBest, "filter_with_max_budget")

bohblrn: PDP filter_factor_last

plotPartialDependence(bohblrnTaskBest, "filter_factor_last", plotICE = FALSE)

bohblrn: PDP filter_algorithm

plotPartialDependence(bohblrnTaskBest, "filter_algorithm")
summary(bohblrnBest$filter_algorithm)
## progressive  tournament 
##          63          54
summary(bohblrn$filter_algorithm)
## progressive  tournament 
##         278         590

In general budget_log_step perform better with higher values but worse prediction do barely increase with higher configuration values. There are also little drops around -0.3 to 0.5.

Filter_with_max_budget should be set to “TRUE”. There are more observations left than in the subset with factor “FALSE”. In proportion, more “FALSE” have already been thrown out and therefore this is another indication that “TRUE” is the choice for better yval.

The Parameter filter_factor_last high values could perform results best because even the the differences are low there are more observations than on other ranges. A good choice for a configuration is over 5.

The thesis that filter_algorithm should be “progressive” can be confirmed. The Partial Dependence Plot doesnt show it but a lot of tournament got filtered out.

surrogate_learner knn1

Lets investigate the surprising outcome of surrogate_learner class bohblrn

knn1Best <- bohbBest[bohbBest$surrogate_learner == "knn1",]

knn1BestTask <- TaskRegr$new(id = "bohblrnBestTask", backend = knn1Best, target = "yval")

PCP knn1

plotParallelCoordinate(knn1BestTask, labelangle = 10)

Importance Plot knn1

plotImportance(knn1BestTask)

PCP: The parameter filter_with_max_budget should set to “TRUE”. It looks like there a specific ranges for budget_log_step which brings better results. The hyperparameter survival_fraction should be high and the parameter. random_interleave_fraction should be low for good results. High filter_factor_last values could be better since there a lot of lines + results in high yval values. The parameter filter_select_per_tournament should be set to 1.

Importance Plot: The paramter filter_factor_first and survival_fraction and filter_factor_last. are most important according to Importance Plot.

The interesting parameter according to PCP and Importance Plots should be examined.

knn1: PDP filter_factor_first

plotPartialDependence(knn1BestTask, "filter_factor_first", plotICE = FALSE )

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "survival_fraction", plotICE = FALSE)

knn1: PDP filter_factor_last

plotPartialDependence(knn1BestTask, "filter_factor_last", plotICE = FALSE)

knn1: PDP filter_with_max_budget

plotPartialDependence(knn1BestTask, "filter_with_max_budget")

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "budget_log_step", plotICE = FALSE)

knn1: PDP filter_with_max_budget

plotPartialDependence(knn1BestTask, "filter_select_per_tournament", plotICE = FALSE)

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "random_interleave_fraction", plotICE = FALSE)

In general the parameter filter_factor_first seems to produce better results in low ranges but best results are in configuration ranges under 4. The variable survival_fraction should get a vlue over 0.5 (interesting because in the general case lowe values were better!). The hyperparameter filter_factor_last and random_interleave_fraction doesn’t really tell us where the best configurations are.

surrogate_learner knn7

knn7Best <- bohbBest[bohbBest$surrogate_learner == "knn7",]

knn7BestTaskBest <- TaskRegr$new(id = "knn7Task", backend = knn7Best, target = "yval")

PCP knn7

plotParallelCoordinate(knn7BestTaskBest, labelangle = 10)

Importance Plot knn7

plotImportance(knn7BestTaskBest)

PCP: filter_algorithm should be “tournament”. filter_factor_first should be around 4. random_interleave_random should be “FALSE”. survival_fraction seems to be a low. The parameter filter_with_max_budget should be set to “TRUE”. The hyperparameter random_interleave_fraction should get a low value and the parameter filter_select_per_tournament should get a value around 1.

Importance Plot: The most important parameters are filter_factor_first, filter_factor_last and budget_log_step.

knn7: PDP filter_factor_first

plotPartialDependence(knn7BestTaskBest, "filter_factor_first", plotICE = FALSE )

knn7: PDP filter_factor_last

plotPartialDependence(knn7BestTaskBest, "filter_factor_last", plotICE = FALSE)

knn7: PDP budget_log_step

plotPartialDependence(knn7BestTaskBest, "budget_log_step", plotICE = FALSE)

knn7: PDP filter_algorithm

plotPartialDependence(knn7BestTaskBest, "filter_algorithm", plotICE = FALSE)

knn7: PDP random_interleave_random

plotPartialDependence(knn7BestTaskBest, "random_interleave_random")

knn7: PDP survival_fraction

plotPartialDependence(knn7BestTaskBest, "survival_fraction", plotICE = FALSE)

knn7: PDP random_interleave_fraction

plotPartialDependence(knn7BestTaskBest, "random_interleave_fraction", plotICE = FALSE)

knn7: PDP filter_select_per_tournament

plotPartialDependence(knn7BestTaskBest, "filter_select_per_tournament", plotICE = FALSE)

knn7: PDP filter_with_max_budget

plotPartialDependence(knn7BestTaskBest, "filter_with_max_budget")

The Parameter filter_factor_first should be under 4, budget_log_step produces best values over 0.5 but has not a big impact in general. Again, we don’t see the perfect range for filter_factor_last and random_interleave_fraction. And we can not confirm with certainty that “tournament” are always better. random_interleave_random should be “FALSE”. filter_select_per_tournament should be over 0.5. filter_with_max_budget should be “TRUE”.

surrogate_learner ranger

Finally, the ranger should be investigated since the average performance for good configurations increased a lot.

rangerBest <- bohbBest[bohbBest$surrogate_learner == "ranger",]

rangerBestTaskBest <- TaskRegr$new(id = "rangerBestTask", backend = rangerBest, target = "yval")

PCP ranger

plotParallelCoordinate(rangerBestTaskBest, labelangle = 10)

Importance Plot ranger

plotImportance(rangerBestTaskBest)

PCP: budget_log_step should be high. filter_with_max_budget should be “TRUE”.

Importance Plot: The most important parameters are filter_factor_first, filter_with_max_budget and budget_log_step.

ranger: PDP survival_fraction

plotPartialDependence(rangerBestTaskBest, "filter_factor_first", plotICE = FALSE)

ranger: PDP budget_log_step

plotPartialDependence(rangerBestTaskBest, "budget_log_step", plotICE = FALSE)

ranger: PDP filter_with_max_budget

plotPartialDependence(rangerBestTaskBest, "filter_with_max_budget", plotICE = FALSE)

A high budget_log_step and a low filter_factor_first seems produce best performance. For budget_log_step a value over -0.5 seems to be good, for filter_factor_first a value under 2.5 performs best. It needs to be noticed that only around 45 observations are left and so the intepretation is not that clear. The parameter filter_with_max_budget should set to “TRUE”.

budget_log_step

Another important parameter for the “bohb” samples is the budget_log_step parameter. Let’s have a look on the PDP.

PDP full Data

plotPartialDependence(bohbTask,"budget_log_step", plotICE = FALSE)

subdivided dataset

plotPartialDependence(bohbBestTask, features = c("budget_log_step"), plotICE = FALSE)

In General the value for budget_log_step should be over -0.5. A high value seems a good choice in the subdivided dataset. However, we could also see before that the parameter varies greatly for the surrogate_learner “knn1” and “knn7” and therefore the parameter is assigned a high importance without it being clear how best to set the parameter.

random_interleave_fraction

Random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in the “bohb” sample and in the “random” sample. Slighty more important in “random” sample. Let check this parameter.

bohb Subset

plotPartialDependence(bohbTask, features = c("random_interleave_fraction"), plotICE = FALSE)

random Subset

plotPartialDependence(randomTask, features = c("random_interleave_fraction"), plotICE = FALSE)

For the random_interleave_fraction and the “bohb” sample a good choice is a value which is not too high or too low since they give worst performances. a good value seems to be between 0.1 and 0.7 . For the “random” sample low values bring better performances here.

top 20%

plotPartialDependence(bohbBestTask, features = c("random_interleave_fraction"), plotICE = FALSE)

In the upper case, there is no bad range at the edges.

filter_factor_last

The parameter filter_factor_last was less important but a little check is good as well.

full dataset

plotPartialDependence(bohbTask, "filter_factor_last", plotICE = FALSE)

subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("filter_factor_last"), plotICE = FALSE)

The effect is low and should be only chosen according to the surrogate_learner.

filter_with_max_budget

full Dataset

plotPartialDependence(bohbTask, features = c("filter_with_max_budget"), rug = FALSE)

subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("filter_with_max_budget"), rug = FALSE)

The parameter filter_with_max_budget has a weak effect but should be set to “TRUE”.

filter_select_per_tournament

This parameter filter_select_per_tournament had barely an effect on the general case but got a little more important in the top 20% configurations. We check the partial dependence and the dependencies with the most important parameters to get more insight.

PDP filter_select_per_tournament

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament"), plotICE = FALSE)

PDP: Combination with survival_fraction

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "survival_fraction"), rug = FALSE, gridsize = 10)

PDP: Combination with filter_factor_first

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_first"), rug = FALSE, gridsize = 10)

PDP: Combination with filter_factor_last

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_last"), rug = FALSE, gridsize = 10)

The effect is weak and maybe comes from the peaks around 1. The parameter value should be probably choosen between 1 or slightly better but the effect shouldn’t effect much.

filter_factor_first

The parameter filter_factor_first was a very high ranked parameter in the parmameter Importance Plot for top configurations.

PDP filter_factor_first

plotPartialDependence(bohbBestTask, features = c("filter_factor_first"), gridsize = 20, plotICE = FALSE)

PDP: Combination with filter_factor_last

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "filter_factor_last"), rug = FALSE, gridsize = 10)

PDP: Combination with survival_fraction

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "survival_fraction"), rug = FALSE, gridsize = 10)

PDP: Combination with budget_log_step

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "budget_log_step"), rug = FALSE, gridsize = 10)

In general lower values for filter_factor_first achieve slightly better performance. But the differences are small and should not lead to a change in considerations made.